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Abstract 

This paper presents an approach to estimating a hidden process in a continuous-time 
setting, where the hidden process is a diffusion. The approach is simply to minimize the 
negative log-likelihood of the hidden path, where the likehhood is expressed relative to 
Wiener measure. This negative log-likelihood is the action integral of the path, which 
we minimize by calculus of variations. We then perform an asymptotic maximum- 
likelihood analysis to understand better how the actual path is distributed around the 
least-action path; it turns out that the actual path can be expressed (approximately) 
as the sum of the least-action path and a zero-mean Gaussian process which can be 
specified quite explicitly. Numerical solution of the ODEs which arise from the calculus 
of variations is often feasible, but is complicated by the shooting nature of the problem, 
and the possibility that we have found a local but not global minimum. We analyze 
the situations when this happens, and provide effective numerical methods for studying 
this. We also show how the methodology works in a situation where the hidden positive 
diffusion acts as the random intensity of a point process which is observed; here too it 
is possible to estimate the hidden process. 

1 Introduction. 

The basic problem tackled in this paper is to try to estimate the hidden part (^t)o<t<T of a 
vectoi0 diffusion process Zt = [Xt; Yt] given the observations (yt)o<t<T- Of course, this is an 
idealized question, because in practice we would only observe the signal at discrete instants of 
time, which for simplicity we will assume are equally spaced with a spacing of h > 0. It would 
then in principle be possible to write down the likelihoocH of {Znh)o<n<N, and maximize this 
over the unobserved variables {Xnh)o<n<N, with the y- values known and fixed. 



^ We use the Scilab/Matlab notation, where z = [x; y] denotes the column vector z formed by stacking the 
column vector x above the column vector y. 
^ We suppose that T = Nh. 
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There are two quite substantial obstacles on this path. The first is that (except in a few 
special examples) there is no closed-form expression for the transition density of a general 
diffusion, so evaluating the likelihood is already problematic. The second is that we are 
faced with a maximization over a space of dimension proportional to A^, a number which will 
be large if h is small, as we envisage. The approach of this paper avoids these difficulties 
completely by passing to a limiting form of the problem in which the (negative of the) log- 
likelihood expression to be maximized converges to an integral, the action integral. This 
can be minimized by calculus of variations, so that instead of having to do some numerical 
nonlinear optimization, we find ourselves having to solve a second-order non-linear ordinary 
differential equation (ODE), which is a far easier task. The only issue to be dealt with is that 
the ODE is of shooting type, with boundary conditions at t = and t = T, so that some 
iterative solution scheme is needed: the analysis is explained in Section El 

The action integral is only the negative log-likelihood of the path in some formal sense, 
since it involves derivatives of the paths of the diffusion, and diffusion paths are not differen- 
tiable. Nevertheless, it is a useful heuristicH which leads to a rigorous approximation result 
for the limiting form of the maximum-likelihood path as /i J, : see Theorem [H 

This gives us a way to identify effectively the 'most likely' path x* of X given the path 
of Y, but we would also like to have some idea of how the random path of X is distributed 
around x*. This involves a study of the log-likelihood in a neighbourhood of x* which we 
find is (to leading order) quadratic in the perturbation ^ = x — x*. Thus the perturbation 
is approximately a Gaussian process, whose mean is identically zero, and whose law can 
be precisely characterized by expressing ^ as the solution of a linear stochastic differential 
equation: see Section HJ 

Inference on a hidden diffusion intensity for a point process is discussed in Section |5l 
and the relationship with SMC methodologies is discussed briefly in Section [61 Section [7] 
concludes. 

The idea of studying the action of a diffusion path is quite ancient already, and even as a 
tool for estimation there is a literature: see, for example, pQ, [2]. Markussen [1] arrives at 
essentially the same identification of the maximum-likelihood path as we do; the expression 
derived here for the Gaussian law of the perturbation is considerably more explicit. The 
approach of Archambeau et al [1] has superficial similarities, but is different in major respects; 
in particular, they use a minimum relative entropy criterion to identify a 'best' Gaussian 
approximation to the hidden process, whose structure appears to require the calculation of 
expectations of non-linear functionals of the path. In contrast, the approach followed here 
requires only the solution of ordinary differential equation^ This is particularly advantageous 
in higher dimensions, as the numerical solution of ordinary differential equations still works 
quite effectively in moderately large dimension, whereas methods such as particle filtering 

This heuristic leads in a few hnes to (non-rigorous) derivations of the Cameron-Martin-Girsanov change 
of measure, and to large-deviation rate functions for various diffusion asymptotics. 

^ Archambeau et al work with a more restricted diffusion dynamic (which they assert can be extended), 
and a somewhat different structure for the relation between observations and the hidden process. 
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struggle in dimension much above about seven, as we shall discuss in Section Ei 



2 The setting. 

Suppose that we have some stochastic differential equation (SDE) in 

dZt = (T{t, Zt) dWt + /i(t, Zt)dt, (1) 

where Z is partitioned Z = [X; Y] and where X is n-dimensional, Y is s-dimensional, n+s = d. 
We shall suppose also that a, and /i are in Cl- The problem we address is to estimate 
the hidden part {Xt)o<t<T of Z given observations of {Yt)o<t<T', the possibility that Y is non- 
informative about X is included in the analysis, so what follows applies to the distribution of 
an unobserved diffusion also. 

Closely related to ([T]) is the first-order Euler-Maruyama difference scheme 

dzt^ = a{t^, 4:^) dW, + zl:^) dt (2) 

where t„ = 2~"[2"t]. Despite appearances, this can be viewed as a discrete scheme; the 
increments of z have conditionally Gaussian distributions. It is known (see, for example, Mao 
[3]) that for some constant C 

E[ sup <C2-", (3) 

0<t<T 

SO by the first Borel-Cantelli Lemma we conclude that there is almost-sure uniform conver- 
gence of the processes z^"'^ to the solution Z to the SDE. 

In the discrete-time approximation ([2]) to the SDE, the conditional density of a;(j2~")o<j<2"T 
given |/(j2~")o<j<2"T can be written down immediately. The log-likelihood is 

1 2 

>^nix\y) = -| X] ^ I (^{jh, ZjhY^ ( Zjh+h - Zjh - hfi{jh, Zjh) ) | - 

j=0 
N-l 

= -I I] ^ I (^{jh,zjh)~'{ ^'^^^^ - Kjh,Zjh) ) r - V^(xo), (4) 

j=0 

where the prior density of Xq is exp{—ip{x)), and h = 2^". Inspecting (jlj), we see a difference 
quotient of z which it is tempting to replace with a derivative as n — )■ oo, leading to the 
formal limit 

A{x\y) = -3 / \ a{s,Zsy^{ Zs - n{s,Zs)) \ ds - ip{xo) (5) 

T 

iIj{s,Xs,Ps) ds - ip{xo) (6) 
3 




where the Riemann sum becomes an integral, and we write Ps = Xs- This is a perfectly 
sensible functional of a path z, even though for a diffusion process the path will not be 
differentiable. 

Our viewpoint is that the problem which concerns us is to estimate x(j2~'^)o<j<2''T in 
the discretely-sampled model ([2]), for some fixed (quite large) integer u, and that the SDE 
version of the problem is to be used to help us in this. In practice, we will only ever have the 
observation data y{j'2^'^)o<j<2''T in discretely-sampled form (how would it be stored if not?!) 
so this is the realistic question. When we minimize the log-likelihood dl]) over x, we only 
need the values of y at the multiples of h = 2~'^; we therefore lose no generality in supposing 
that y has been interpolated (by cubic splines, say) to be in all of [0,T]. This does not of 
course change @, but it does mean that the functional A is meaningfully defined for any 
path {xt)o<t<T- Given this, we have the following result, where to emphasize again, we are 
considering what happens as n > z/ tends to infinity, with u fixed. 

Theorem 1 Suppose that a, a^^ and ji are C^, and that {yt)o<t<T is also . Suppose further 
that 

lim (^(x) = oo. (7) 

|a;|— >-oo 

Then 

lim inf{ ~\n{x\y) ] = inf{ -k{x\y) ]. (8) 

n— >oo X X 

Assuming that A(-|?/) has a unique maximizer x* , and that Xn is a maximizer ofXn{-\y), then 
Xn — >■ X* uniformly. 

Proof. See Appendix S 

The usefulness of Theorem [T] is that it shows that a minimizer of —Xn{'\y) is very close to 
the (assumed unique) minimizer of — A(-|y); but this minimizer can be found by calculus of 
variations without resort to some high- dimensional non-linear optimizer. 

Notice that although we make a discretization error when we replace the SDE ([1]) with 
the Euler-Maruyama scheme (|2]), in most examples of practical interest the magnitude of 
this discretization error will be insignificant compared to the observation error that we are 
attempting to see past. 

3 Finding the least-action path. 

Theorem [T] shows that asymptotically as the step size h = 2~" tends to zero, the maximum- 
likelihood estimate of the unobserved path converges to the maximizer of the continuous-time 
analogue A. Resorting now to calculus of variation techniques leads us to the following central 
result. 
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Theorem 2 The path {x,p) which minimizes the action functional 

i-T 



A{x\y) 



i){s,Xs,Ps) ds + Lp{Xo) 



(9) 



satisfies the following ODE with boundary conditions: 






= L'p.?/^(0,xo,Po) - 


D.^^Lp{Xo) 


(10) 







- XkDp^D^^ip - pkDp^Dp^ip 


(11) 





= Dp^i){T,XT,PT) 




(12) 



Proof. The negative log-likelihood to be minimized is 



^(xo) + / i){t,xt,xt) dt = Lp{xo) + / ip{t,xt,pt) dt, 
'o ^0 



(13) 



where we write p = x. This we attack by calculus of variations; if we have found the optimal 
X, then any perturbation to x + ^ must to leading order make zero change to the objective. 
Writing down the first-order change and integrating by parts gives 



= ioD^{xo)+ {ifD,i, + i-Dpi,]dt (14) 
Jo 

pT 

= ^oD^ixo) + [etDp^i^] + / ^t{D.,iJ - DtDp^tP - x^Dp^D^J - pkDp^Dp^i:} dt. 

Jo 

Since ^ is arbitrary, we deduce the conditions ffTOj) . f lTT]) . f|T2|) for optimality. □ 



Remarks, (i) We have found a (generally non-linear) first-order ODE ( 1TT|) for {x,p), with 
initial condition f IlOp and terminal condition ( IT^ . This will generally have a unique solution, 
though the 'shooting' nature of the ODE is rather clumsy in practice. 

(ii) We expect that this methodology will be advantageous in higher-dimensional problems; 
algorithms for solving ODEs in high dimension tend to degrade less rapidly than (for example) 
gradient optimization methods, or particle filtering. 

(iii) If the time horizon T is too large, it may be that the shooting ODE is not able to identify 
the initial and terminal conditions with sufficient accuracy. 

To illustrate the methodology in action, we present here some simple examples. 

Example 1. Here we take independent one-dimensional Ornstein-Uhlenbeck processes X, y 

dX = axdW-PxXdt 
dy = aydW - PyYdt 
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where ax = 1-053, ay = 1.0127, = 0.1054 and /Sy = 0.0253. We observe Y = X + y. Then 
Z = [X; Y] solves 

= {Z ){dW') + { -Pxlpy \ ){y '^^ 

which is a simple linear SDE. For this example, 

^ I I P \ A ( ^ \ \ I I P \ A ( ^ 



A sample path of the SDE was simulated, and the results are displayed in Figured! The true 
path is dashed in green, the noisy observation is in blue, and the least-action path is in red. 

Example 2. This time we have a two-dimensional linear example, with X satisfying 

dXt = dWt + K ~M Xt dt = dWt + AoXt dt, 



1 ^ 

and Y = X + y, where y is an independent OU process 

dyt = adwt — Xyt dt, 

with A = 0.005, a = 20. The results of the least-action analysis are displayed in Figure 
[21 Once again, the estimates are quite impressive, but since the underlying dynamics are a 
perturbation of uniform motion in a circle, this is perhaps not so surprising. 

Example 3. This is another one- dimensional example, but this time non-linear. We have 
dynamics 

dXt = axdWt — 6sin(aXj) dt 

for X, where ax = 3, 6 = 12 and a = 27r/5. Thus the drift tries to keep the diffusion near 
to multiples of 5. The observation Y is as before of the form Y = X + y where y is an 
independent OU process 

dyt = dWl - Xyt dt 

with A = 0.05. The results of the analysis are shown in Figure [31 As can be seen, the 
hidden Markov process stays close to multiples of 5, occasionally moving from one value to a 
neighbouring value, and as the observational error gets larger, the filtering performs less well. 

Example 4. The final example is a more demanding test of the methodology. Here, we take 
a two-dimensional non-linear dynamic for X 

dXt = Y.dWt -hsm{aXt) dt 
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where 

a, b as for the previous example. The observation is univariate, and takes the form Y = v-X+y 
where y is an independent OU process 

dyt = crdwt — Xyt dt 

where v = (1,2), A = 0.05, and a = 1. It would be indeed remarkable if the methodology 
was able to unscramble the two hidden components of the diffusion from observation of just 
one component, and the results are shown in FigHl sometimes the estimate is close to the 
true values, sometime less so. But it should be understood that this is not a limitation 
of the methodology, which is after all discovering the maximum-likelihood estimator of the 
hidden process; the relatively poor recovery of the hidden path is because we have a small 
signal-to- noise ratio for this example. 



4 Second-order analysis. 

The first problem we focused on is maximizing the log-likelihood as given for the discretized 
problem by (j4j). The unknown in this situation is the sequence X = {xj2--^)o<j<2-'T, and as in 
classical asymptotic ML theory, having identified the ML estimator X of the unknown X, we 
can enquire about the distribution of X about X. This amounts to understanding the second 
derivative of the log-likelihood at the MLE. Given suitable regularity of the likelihood, a Taylor 
expansion shows that the distribution of X around X will be approximately Gaussian, with 
covariance equal to the second derivative of the log-likelihood. However, we have seen that 
there are considerable methodological advantages in passing from the discrete problem to 
the continuous analogue ([6]), and these advantages apply also at the level of the second-order 
terms in the expansion of the log-likelihood, as we shall now see. 

Theorem 3 Suppose that x* is a local minimizer of the functional A{-\y), and define the 
matrix-valued n x n functions of time 

Ai^ = D^^D^^^P, Bl^ = D^^Dp^^P, qi^ = D^^D.^i;, (16) 

evaluated along the path x* . Then: 
(i) the ODE 

A + et = KjqtKt = (Bt + et)q^\Bj + 6^), Or = (17) 
has a unique solution; 
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(ii) writing x = x* + C,, we have that 

- Aix\y) + Aix*\y) = Qo(0 + o{ + ds), (18) 

Jo 

where 

QoiO = Ko ■ { D'vi4) + Oo }eo + r i {it + KS) ■ qtiit + Ktit) dt, (19) 

Jo 

and where 

Kt = q^\Bj + et). (20) 

1 /2 ' 

Remarks, (i) If we write Wt = (Ct + Kt^t), then the integral term in flTIJl) takes the 
form |wtp c?^, which is the action integral for standard Brownian motion. Thus informally 
what we learn from Theorem [3] is that the perturbation ^ 'looks like' a continuous zero- mean 
Gaussian process obtained by solving the linear SDE 

d^t = -Kt^tdt + q;'^'dWt, (21) 

and started with initial precision D^y9(xg) + ^^o- The covariance of ^ can thus be obtained from 
an Ito expansion of 

d^^ = {-KMJ - i,ijKj + q-^)dt, (22) 
where = signifies that the two sides differ by a (local) martingale. Hence 

Vt = -KtVt - VtKj + (23) 

and this allows us to calculate the covariance at time t, again by solving an ODE. 

(ii) The ODE flTTl) for 9 is quadratic, so there is no general result which guarantees that it 
has a solution; the solution may explode. However, in this situation we can be sure that it 
will not, because A(-|?/) is minimized at x*, as we shall see in the proof. 

(iii) It has to be admitted that the relationship between Theorem [3] and the original discretized 
problem is somewhat tenuous. With effort it would no doubt be possible to establish an 
analogue of Theorem [T] for the second-order effect, but it is harder to see what the use of such 
a result would be. The way we envisage using Theorem [3] would be to generate an approximate 
distribution for the posterior of X given y, which could be checked numerically against a full 
posterioiH, and which could be used to give approximate moments of the posterior law of X . 

^It would be rare that this could be found in closed form, so we would typically need to resort to particle 
filtering to make a numerical approximation. 
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Proof of Theorem [31 Let us begin by assuming the first statement (i) of the theorem 
and showing how this leads to the more interesting statement (ii). We will return to prove (i) 
subsequently. 

If we go back to the action functional ([9]) and consider expanding around the local mini- 
mizer x* by perturbing to x* + ^ for small ^, the second-order contribution we get is 

Jo 

where the matrix- valued functions of time A, B and q are defined by ( IT6|) . This quadratic 
functional of ^ characterizes the (approximate) Gaussian distribution of the perturbation. Let 
us notice that because x* is assumed to be a minimizer of the action functional, this quadratic 
functional of ^ must be non-negative. 

Now suppose that we have some symmetric-matrix-valued function of time, 9, such 
that = 0. Then we may write 

Q{i) = Q{0 + Ho-Oo^o+[HfOS]l 

= Ko-(/^V(a:S) + «o 
Jo 

The quadratic form inside the integral can be expressed as 

I it ■ qtit + 6 ■ {Bt + et)it + 1 6 ■ (A + mt = \{it + ks) ■ gtii + ks), (26) 

where Kt = q^^{Bj + 6t), provided 

At + et = KjqtKt = {Bt + 9t)q;\Bj + 9^), (27) 

that is, provided 9 solves the ODE (1171) . Therefore we have shown that the second statement 
of the theorem follows once we have established the first. 



We now turn to the issue of the existencqj of a solution to f|T7j) . The proof that the 
quadratic ODE (|T7|) has a solution exploits the only piece of information we have about 
the quadratic form Q, namely that it is non-negative, since x* maximized K{x\y). We may 
therefore consider the optimal control problem 

G{t, x) = inf I ^ { ■ ^6 + L ■ BsL + Hs- QsL] ds:^t = x^ (28) 



^Uniqueness of the solution is not a problem, because the right-hand side of the ODE p7)) is locally 
Lipschitz in 6*, and once we know that there exists a solution this is enough to guarantee uniqueness. 
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which is well posed, and the solution G has a quadratic form: 

G{t,x) = ■ Htx, (29) 

which is centred, since clearly G{t, 0) = for all t. Since G is the value function, we shall 
have that ^ 

h{t)= [ {Hs-As^s + ^s-Bst + Hs- qsi}ds + G{t,^t) (30) 

is non- decreasing whatever ^ we choose, and will be constant if we choose the optimal ^. 
Differentiating (130|) gives us 

ht = i^t- At^t + 6 ■ Bdt + Uf Qtit + Hf Htit + ^tHtif (31) 
Minimizing over and equating to zero to find the optimal gives us 

it = -qi\Bj^, + H,^,), (32) 
so that the minimized value of ht becomes 

\it ■ Atit + 16 ■ Htit - 1 {Bjit + H,it) ■ ql\Bjit + Htit). 
At optimality, this must be zero; so 

^ = At + Ht-{Bt + Ht) qi\Bj + Ht). (33) 

Evidently from the definition we have (^(T, ■) = 0, so that H solves (|T7|) with the zero 
boundary condition at t = T. In other words, we have found 6', and 9 = H . Notice in 
particular that the equation fl5^ becomes ^ = —Kt^t- 

□ 

Numerical aspects. 

Theorem [3] characterizes the behaviour of the path x around the local minimizer x*, but in 
any given example, this is not the end of the story. The point is that when we seek x* we 
resort to numerical mean^ to solve the ODE f|TT]) subject to the boundary conditions ffTOj) . 
f|T2|) . and it may be that the solution found is not the global minimizer. It could be that 
the solution obtained is only a local minimizer; it could be that the solution found is a only 
stationary point of A(-||/); or it could be that the numerical method has got into difficulties 
and what it returns is not a solution. An obvious approach to this problem is simply to try 
numerical solution of flTTl) back in time from initial value 9t = 0; however, if the ODE solver 
fails, usually the entire calculation stops, and no attempt to continue is possible. We can 
avoid this issue if we transform the first-order non-linear Riccati ODE into a second-order 
linear ODE. The following result gives all we need. 

""All the calculations for this paper were performed in Scilab www.scilab.org which is a free and very 
versatile package able to perforin a wide range of numerical and graphical tasks. 
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Proposition 1 Suppose that x* is a solution to (fTUj) . ( ITTl) . ([T^ . anc? i/iai A, B and q are 
defined in terms of x* by f lTB]) . Then: 

(i) If the ODE f|T7|) /ias a unique solution, then the n x n-matrix-valued function Ft defined 
as the unique solution to the first-order linear ODE 

Ft = -KtFt = -qt\Bj + 9t)Ft, Ft = I (34) 

solves the second-order linear ODE 

= {At- Bj)Ft + {Bt - Bj - qt)Ft - qth (35) 

with boundary conditions 

Ft = I, B^Ft + qT^T = 0. (36) 

(a) If the (unique) solution to fl35l) with boundary conditions fl36|l is non- singular for all 
time, then 9 defined by 

Bj + 9t = -qtFtF,-^ (37) 
solves (fT71) . and is the unique solution. 

(Hi) If the (unique) solution to fl35p with boundary conditions fl36p fails to be non-singular 
for all time, then x* cannot be a local minimizer of A{-\y). 

Proof, (i) Suppose that 9 is the unique solution to ffT7|) . and use this to define a function 
F using flMl) with conditions fl5Bl) at T. Since ([31]) is a hnear ODE with bounded continuous 
coefficients, it has a unique solution. Multiplying (134|) on the left by qt and differentiating 
leads to 

-qP-qF = {B^ + 9)F + {B'^ + 9)F 

= {B^ - A+{B + 9)q-\B^ + 9))F + {B^ + 9)F (38) 
= {B^ - A)F - {B + 9)F + {B^ + 9)F 
= {B'^ - A)F -{B- B^)F 

which is the ODE (15^ . as required. Here, we have used (fT7|) at line ( 1551) . 

(ii) Using (l37j) to deffne a function 9 of time, we see from (136|) that 6't = 0, so it remains to 
check that 9 so defined satisfies ( !T7|) . Multiplying on the right by F and differentiating ( 137|) 
gives us 

(^^ + ^)F + (S^ + 9)F = -qP -qP = {B'^ - A)F + (5^ - B)P (39) 
leading to the equation 

{A + ^)F = -{B + 0)F = (5 + 0) g"^(5^ + 9)F. (40) 
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Since F is assumed to be invertible for all t, we can clear out the common factor of F on each 
side and recover the ODE f[T7l) for 6. 

(iii) Suppose that x* were a local minimizer of A(-|?/); then by Theorem [21 the ODE fll7p has 
a unique solution. We may therefore define anx n matrix- valued function ft of time to solve 

ft = -Ktft, fT = I 

where Kt = q~[^{Bj + 6t). Notice that K is bounded on [0, T], so we may also define the n x n 
matrix-valued function ^pt by 

Lpt = ^PtKt, ^Pt = I- 

Calculus shows that iptft is constant, and therefore equal to /; in particular, ft is always 
invertible. According to the first statement of the Proposition, / solves the ODE (135 p subject 
to the boundary conditions f l36|) . which has a unique solution. But by hypothesis, this solution 
becomes singular at some time in [0,T], which is a contradiction. The conclusion follows. 

□ 

Remarks. The way Proposition [1] gets used in practice is that we firstly identify what we 
think may be the action-minimizing path x* and then we solve the second-order linear ODE 
(135 p with the boundary conditions fl5B|) . There is never any problem with this ODE, since 
it is linear with bounded continuous coefficients. Having solved for F, we now calculate the 
determinant of Ft along the path of F. If this ever vanishes, it indicates that x* was not a 
local minimizer of Qq. 



5 An extension. 

We have so far considered a situation where a diffusion X is observed through some continuous 
process Y of observations, and we are required to estimate X. Another quite natural situation 
we might want to consider would be where the hidden process X is some positive diffusion, 
which acts as the intensity process of some observed point process. So suppose that there is 
some non-negative diffusion x satisfying 

dxt = a{t, xt) dWt + xt) dt (41) 

which serves as the stochastic intensity of a counting process N . The times < ri < r2 < 
. . . < tatj, < T of events are observed, and we have to filter x from these observations. In this 
case, the action integral ( = -log-likelihood) to be minimised is just 

(f{xo)+ / ^Ij{s,Xs,Xs) ds + / - log , (42) 

Jo Jo ,_i 
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where we have abbreviated n = N^, and ijj is as before 

iIj{s,x,p) = {p- fi{s,x)y/2a{s,xf. 



(43) 



Once again, we consider a small perturbation ^ away from the optimal x, and equate the 
first-order change to zero. Assuming for simplicity that a and a are independent of time gives 



= <^'(xo)eo + / {^.e + M +0dt-y2 

-^0 i=l 



Ti 



1=0 -^^J i=l 



XiTi 



j=0 



Tj + l 



{i^x - - tpppP + 1}^ dt + ^ti^piXt, Xt) 



Tj + 1 



E 



i=l 



XiTi 



In order for this to be zero whatever perturbation ^ is used, we have to have a number of 
conditions: 



= ipx — i^pxi — i^ppP +1 in each interval {tj, Tj+i) 

= (f'ixo) -i'pixo,Po) ; 

= -lljp{Xr,,Pr^ + ) + i)p{Xr,,Pr,-) " x{TiY^ ] 

= ijp{xT,PT)- 



(44) 
(45) 
(46) 
(47) 



Thus the least-action path must be constructed piecewise in each of the intervals between 
observations. 

Example 5. To illustrate this situation, we suppose that the positive diffusion x is a log- 
Brownian motion: 

dxt = Xt{adWt + ndt). (48) 
Some calculations reveal that the ODE (HH) is 







p^ — xp + a'^x^ 



(49) 



Notice that any solution of the ODE f l49|) must be convex while it is positive. The non-linear 
ODE (1491) has various solutions, and can indeed be solved explicitly. However, the explicit 
solutions are explosive, so it turns out that to calculate the solution it is more effective to 
work numerically, which allows us to truncate the coefficients to prevent explosions. Once a 



^We write tq = 0, r„+i = T. 
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solution is found, the truncation is no longer necessary. Since ipp = {p — iix)/a'^x'^, the final 
condition fH7|) tells us that we must have pt = ^^XT, and f H^ gives that 

APr. = Pn+ - Pn- = -(^^Xr^, (50) 

which determines the change of derivative over the observation points. Since the gradient 
drops at the observation points, this allows that the solution x found does not have to be 
convex globally, even though we have observed that it must be convex in each interval between 
observations. 

Figure \5\ illustrates the results of applying the method to this example. Despite the fact 
that there were only 49 points observed in the observation period, the general shape of the 
hidden intensity is well captured. 



6 Relationship to particle filtering. 

For the kind of problems discussed in this paper, the particle filtering methodology provides 
a natural approach. But some simple thought experiments highlight significant limitations 
of the methodology, and suggests ways in which the approach of this paper could in some 
circumstances supplement or replace conventional particle filtering. 

Arguably one of the greatest strengths of particle filtering is that it evolves an estimate 
of the posterior distribution of the hidden variables; this guards most effectively against the 
common mistake of underestimating the error in parameter estimation. However, the method- 
ology though simple is hard to apply effectively. One place where problems arise is in evolving 
the posterior distribution (represented by a finite collection of point masses) forward to 
the next time point. The simplest version of particle filtering allows each point to make a 
jump according to the Markovian transition mechanism, and then reweights the new particles 
according to their likelihood given the new observation. One often find^lj that the likelihood 
of almost all (or indeed, all) the new particles is tiny; most of the new randomly-selected 
particles are miles away from the observation. 

To understand the magnitude of the effect, imagine that there is a ball B of radius b 
hidden in the unit cube C = [0, 1]"^ in R*^; think of this ball as the set where f{y\x) > e, 
where f{-\x) is the density of the new observation given the true state x of the hidden Markov 
process, and e > is some fixed threshold value of likelihood. If points are cast at random 
into C, how many on average will be needed before one hits the ball B7 The answer of course 
is just b~'^V^^, where 

_ (27r)'^/2 2-^^/2 

r(i + id) 

^The signal processing literature refers to Sequential Monte Carlo (SMC) methods. 
^"^This is particularly problematic when there is Gaussian observation error. 
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is the volume of the unit ball in M'^. If we suppose that the radius is b = 0.1, representing 
a moderate amount of observational noise, Figure M plots the mean number b~''^V^^ against 
dimension d, with the line where the mean number is one million plotted as a red dashed line 
across the figure. This crosses the plot at dimension roughly equal to 7. Thus if we want 
on average one particle to fall in the ball of radius 0.1 around the observation in dimension 
7, we will require about one million attempts. Now of course we would not in practice just 
mindlessly fire particles completely at random into C; various forms of importance sampling 
will hugely improve the success rate. But this is not quite as straightforward as it first appears. 
If we had a representation for the observation as 



where X is the hidden state of the Markov process, and St is a Gaussian observation error, 
then we would bias the jumps from the particles in the population at time t — 1 towards the 
new observation Yf. However, the observation equation (15T|) is not how it is in most examples; 
more generally we have 



for some very complicated function $, and in order to bias the jumps of the particles towards 
values of x which are likely relative to Yt we have to know what values of x make close 
to Yt. In other words, we have in effect to find the ML estimator of Xt given Yt. Thus the 
particle-filtering methodology, which is wholeheartedly Bayesian in philosophy and execution, 
is often forced to resort to frequentist methodology in practice to help it overcome the curse of 
dimension. The least-action approach advanced here is such a ML method, and could be used 
to pick out a most likely path x* given a path Y of observations; the second-order analysis 
would tell us how to simulate paths around the most likely path x* so as to generate a sample 
of paths which would have a reasonably high likelihood. 

7 Conclusions. 

This paper offers an approach to estimating a hidden diffusion process observed either through 
other components of a joint diffusion, or through a point process whose intensity is the hid- 
den diffusion. The approach is in effect a maximum-likelihood approach, but because of 
the continuous time parameter, calculus-of-variations techniques can be applied to identify 
the least-action (=maximum-likelihood) path. Moreover, by taking the calculus-of-variations 
analysis to second order, we are able to find not only what the most likely path is given the ob- 
servations, but also approximately what is the distribution of the hidden path about the most 
likely path. One point which is worth emphasizing is that in contrast to other approaches, this 
methodology works without any assumption of symmetrizability of the diffusion, and requires 
only calculus in W^. The numerical methods involve shooting solutions of ODEs, so this can 
be technically quite delicate, especially if the time interval over which the solutions are to be 



Yt = Xt + et 



(51) 




(52) 
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constructed is long. Accordingly, it cannot yet be claimed that this methodology will work 
well in all conceivable situations. However, it is a methodology which will in principle work 
in quite high dimension, and for this reason shows considerable promise. 



16 



A Appendix 



Proof of Theorem \Ti li z [0,T] R'^ is then let ^("^ denote the piecewise-hnear 
approximation which matches z at each multiple of /i = 2~". TheiJ^ we have 

N-l (n) _ in) 

T ^ 

I (y{Sn, ZsX^ ( - /^(Sn, Zs„) ) \ ds - ^{Xq) 

T ^ 

-1 I \ o-{s, Zs)~'^{ Zs - iJ,{s,Zs)) \ ds-ip{xo) 
Jo 

= Hx\y) 

by dominated convergence. Hence we have 

limsup inf| — A„(x|y) } < inf| —A{x\y) }. (53) 

In the other direction, let L„ = infa^j —Xn{x\y) }, and suppose that x^"-' is close to minimizing 
—Xn{-\y) in the sense that 

-A„(x(")|y) <L„ + 2-" (54) 

We then extend x*-"-* by piecewise-linear interpolation to create a path z^^^ defined throughout 
[0,T]. We shall compare the discrete integral 

7V-1 (n) _ (n) 

D ^ kj:h\aijh,z%Y\ "''^\ -f^ijh,z\^))\' 

= h [ |a(.„,z£))-i(iW-/i(.„,zW)) I'rf. (55) 
Jo 

T 

la^p ds 

with the continuous integral 

C ^ 1 /V(.,^("))-i(i(")-M^,4"^)) (56) 

= [ \bs\^ds 
Jo 

"Recall that s„ = 2-"[2"s]. 
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and show that the two are very close for large n. Notice that the only difference between (1551) 
and (jSSD is that the time index passes through the discrete values in the first, but moves 
continuously through [0,T] in the second. The comparison goes via 



\D-C\ < f\\a,\^-\h:^ 
Jo 



ds 



T 







\{as - bs) ■ {as + bs) \ ds 
T \ / i-T 

2 



< \as-hsVdsjyj \as + hsYdsj (57) 

by showing that the first factor in flFTl) is small, and the second is bounded. Firstly we exploit 
fl5^ to give for some positive finite constants F and 7 and < t < m < T the bounds 



/u 

> 7 r{'At^\'-\f-^is^,&\']ds. 



Hence we have the inequalit\l^ 

f ds<T (55 



which yields the modulus of continuity 



< rVu^- (59) 

It is easy to see that the analogous inequality 

/u 



(60) 



also holds. The boundedness of the second factor in (157|) now follows. Using the modulus 
of continuity, it is now straightforward to show that the first factor in (157|) can be made 
arbitrarily small by taking n arbitrarily large. 
Hence given e > 0, for large enough n we have 

inf { -A{x\y) } < -A(x(")|y) < -A„(a;(")|y) +e<L^ + 2-^ + e. 



denotes as usual a positive finite constant whose value matters little, and may change from line to line. 
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We conclude that 

infj —A{x\y) } < liminf L„ 

X n— >oo 

which combines with fl53|) to give the statement ([8]). 

For the final statement of Theorem [1], we use the condition ([7]) along with the modulus of 
continuity proved above. This shows that the family is equicontinuous, so by the Arzela- 
Ascoli Theorem is relatively compact in the topology of C[0,T]. Any subsequential limit of 
the Xn converges to a path for which the value of —A{x\y) is minimal; but by assumption 
there is only one such path, x*, and so any subsequential limit of the Xn must be x*, hence 
the Xn converge in the (uniform) topology of C[0, T] to x*. 

□ 
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OU process X observed as Y+z, where z is OU 




Figure 1: Result of least-action filtering. 
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2d BM under steady rotation, observed as Y + z, where z is OU 

200 I , 




2d BIVI under steady rotation, observed as Y + z, wliere z is OU 




Figure 2: Result of least-action filtering. 
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dX = sig_X*clW - b*sin{a*X)*dt, Y = X + OU-noise 




Figure 3: Result of least-action filtering. 
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dX = sig_X*dW - b*sin(a*X)*dt, Y = v.X + OU-noise 




Figure 4: Result of least-action filtering. 
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Figure 5: An example of least-action estimation of a hidden diffusion. 
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LogJO of number of particles to get within 0.1 of hidden point 
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Figure 6: Limitations of particle filtering. 
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